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Abstract 

This paper proposes a novel algorithm for the prob- 
lem of structural image segmentation through an interac- 
tive model-based approach. Interaction is expressed in the 
model creation, which is done according to user traces 
drawn over a given input image. Both model and input are 
then represented by means of attributed relational graphs 
derived on the fly. Appearance features are taken into ac- 
count as object attributes and structural properties are ex- 
pressed as relational attributes. To cope with possible topo- 
logical differences between both graphs, a new structure 
called the deformation graph is introduced. The segmenta- 
tion process corresponds to finding a labelling of the in- 
put graph that minimizes the deformations introduced in 
the model when it is updated with input information. This 
approach has shown to be faster than other segmentation 
methods, with competitive output quality. Therefore, the 
method solves the problem of multiple label segmentation in 
an efficient way. Encouraging results on both natural and 
target- specific color images, as well as examples showing 
the reusability of the model, are presented and discussed. 



1. Introduction 

Semi-automated, or interactive, image segmentation 
methods have successfully been used in different appli- 
cations, whenever human knowledge may be provided as 
initial guiding clues for the segmentation process. Exam- 
ples of such methods are the region-growing technique, 
marker-based watersheds ITSl . the IFT 0, graph-cuts 
and Markov-random fields |[TJ [141 El, amongst oth- 
ers. 

Another source of a priori information for segmentation 
are image models, which consist of representative instances 
of desired objects, conveying different types of features 
(e.g. color, shape, geometry, relations, etc.) that describe 



such entities. Approaches guided by models are widely used 
for a variety of image processing purposes such as medical 
imaging | 9 , IJJ 11211131, face recognition and tracking 0|6l, 
and OCR |10)|. 

Though the aforementioned interactive approaches have 
established remarkable contributions to the image segmen- 
tation domain, most of them have not attempted to consider 
image structure to aid the segmentation procedure. An at- 
tributed relational graphs (ARG) is a particularly useful 
representation not only for embedding structural informa- 
tion when modelling a problem, but also for expressing ap- 
pearance features. 

Regarding the segmentation issue, the present paper pro- 
poses a new algorithm for segmenting color images using 
both interactive cues and a model using an ARG-based rep- 
resentation. An object (fig.[T]left) is considered to be a set of 
parts (subset of pixels of an image) and their relations. An 
object model image is defined by a user according to traces 
drawn over the input (fig. [T] right). The input and model im- 
ages are then represented by means of attributed relational 
graphs, in which objects and their relations are represented, 
respectively, as vertices and edges. Under this formulation, 
the segmentation problem is viewed as a graph matching 
procedure. 

The introduced algorithm substantially improves the ap- 
proach described in |4|, in which structure was taken into 
account when segmenting an input gray- scale image ac- 
cording to a model, but the structures under comparison, 
represented by ARGs, often presented different topologies. 
Such differences imply difficulties to determine a suitable 
mapping between the input and model graphs, as well as 
high computational cost. As fig. |2] shows, the graph match- 
ing problem allows many possible solutions and therefore 
the optimization procedure not only has to consider ver- 
tex similarities, but it also has to evaluate various structural 
match configurations in order to rule out those which are not 
plausible for a final segmentation. However, this might be 
misleading when both topologies are distinct and cause the 
method to look down on potential solutions, such as when 
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Figure 1. Objects as parts: in this definition, 
an object might be a whole scene subdivided 
into parts (buffalos, background), or a single 
entity (a person) subdivided into meaningful 
parts (face, clothing, hair). All parts are de- 
fined by the user through traces over the re- 
gions of interest, and each color represents 
an object label. 



evaluating a match between an input edge connecting two 
input vertices which represent adjacent oversegmented re- 
gions related to two distinct objects and a model edge con- 
necting model vertices which represent the the same previ- 
ous objects. 

In this paper, we propose a novel algorithm for the graph 
matching step. Each possible matching from an input ARG 
vertex to a model ARG vertex is seen as a deformation of 
the model graph (fig. |2]), expressed by the introduction of 
the deformation ARG, which represents an altered version 
of the model ARG that preserves its topological proper- 
ties while entailing attributes from the given input vertex. 
This new interpretation addresses the problem of matching 
two topologically different structures and results in a signif- 
icantly faster segmentation method. 

This paper is organized as follows. Section[2]presents our 
formal definition of attributed relational graphs for image 
representation. Section |3] gives an overview of all the steps 
of the segmentation method proposed herein. Section |4] 
describes the problem of image segmentation as a graph 
matching task, whereas Section [5] introduces the proposed 
segmentation algorithm based on an optimization technique 
for matching the input and model graphs. Finally, experi- 
mental results are discussed in Section [6] and a few conclu- 
sions, as well as suggestions for future work, are the topic 
of Section [71 
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Figure 2. Matching of two topologically dif- 
ferent attributed relational graphs versus two 
topologically equal ones under a deformation 
point of view. 



2. Graph-based representation 

The representation of images using graphs and the mat- 
ter of graph matching applied to pattern recognition and im- 
age processing have been explored in a variety of situations, 
such as those reported in the works of Bunke 1 2 1 and Conte 
et al |5 1, as well as in the method proposed by Felzenzswalb 
and Huttenlocher (HI, and the classic work of Wilson and 
Hancock ifTTl , among others. 

In this paper, an attributed relational graph (ARG) 
is a directed graph formally expressed as a tuple 
G — {V^ jj.^u), where V stands for its set of ver- 
tices and E C V x ]/, its set of edges. Typically, a vertex 
represents a single image region (subset of image pix- 
els) and an edge is created between vertices representing 
two image regions, /i : F — > Ly assigns an object at- 
tribute vector to each vertex of V, whereas u : E ^ Le sls- 
signs a relational attribute vector to each edge of de- 



notes the number of vertices in V, while \E\ denotes the 
number of edges in E. 

For color images, the object attribute vector is composed 
of the three average RGB values which characterize the cor- 
responding image region, i.e. = {Ry^Gy^ By). When 
dealing with gray-scale images, = {g{v)), where g{v) 
denotes the average gray-level of the image region asso- 
ciated to vertex v e V. Each component of fi{v) is nor- 
malized between and 1 with respect to the minimum and 
maximum possible gray-levels. Similarly, the relational at- 
tribute of an edge e = (v^w) G E, v^w G V,is defined as 
^{e) = {Pw - Pv)/{'^dmax), where py and are the cen- 
troids of their respective corresponding image regions. The 
factor dmax is the largest distance between any two points 
of the input image region. Other attributes may easily be 
employed, since the methodology presented herein does not 
impose any restriction on the nature of /i and u. 

For the purpose of the segmentation method, three in- 
stances of such ARGs are considered: an input ARG Gi = 
{Vi^Ei^ jii^Vi), derived from the input image, a model ARG 
Gm = {Vm, Em, Mm, ^^m), representing the objects of in- 
terest selected by the user, and a deformation ARG Gd = 
(Yd, Edj jJ^dji^d)^ used as an auxiliary data structure for mea- 
suring deformations implied in the model when matching a 
vertex v eVito another w eVm- 

Subscripts shall be used to denote the correspond- 
ing graph, e.g. Vi G Vi denotes a vertex of Gi, whereas 
(vi^Wi) G Ei denotes an edge of Gi. Similar nota- 
tions are used for Gm and Gd as well. 

3. Methodology overview 

The segmentation process is depicted step-by-step in 
fig. [3] Given an input image to be segmented, the user first 
points out the target objects by placing traces over the in- 
put, thus creating a model image in which each color iden- 
tifies an object of interest. Next, an oversegmentation is per- 
formed using the watershed algorithm to obtain a partition 
image where the real contours of each object are present. 

This oversegmented image is used to create both an in- 
put ARG Gi and a model ARG Gm- The first is obtained in 
the following fashion: each watershed region gives rise to 
a vertex and its attributes, whereas adjacent regions devise 
an edge and its respective attributes. Gm is obtained simi- 
larly, but only those watershed regions which intercept the 
user-defined traces result in a model vertex. Clearly, the in- 
put and model ARGs present different topologies and this 
fact must be accounted for when using structure as a seg- 
mentation guide. 

Since the topological discrepancy is due to the overseg- 
mentation caused by the watershed, the final segmentation 
should be a mapping of all Vi G Vi such that input vertices 
related to image regions corresponding to the same model 
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Figure 3. Overview of the methodology steps. 
The ARGs are depicted only with their ver- 
tices for better visualization. 



object are assigned to the same model vertex. This is equiv- 
alent to merging regions of an oversegmented object into 
a single region. The mapping of vertices from Gi to those 
in Gm characterizes a graph matching problem ElO. Al- 
though many mappings are possible, a desirable solution 
should correspond to an image partition as similar to that 
defined by the model as possible. Thus, to ensure that the fi- 
nal mapping follows the model ARG topology, the defor- 
mation ARG Gd is introduced. This graph is initialized as 
a copy of Gm and it is used to help evaluate the local de- 
formation effect that a given assignment between a vertex 
Vi G Vi and another Vm ^ Vm induces in the model. The 
pursued solution is one that minimizes such effects. In the 
next section, we discuss how these deformations are com- 
puted and how they fit in the graph matching problem solu- 
tion. 

4. The graph-matching algorithm for model- 
based image segmentation 

A segmentation of the input image according to the 
model under the graph-based representation is a solution for 
the graph matching problem between Gi and Gm, charac- 



terized as a mapping f : Vi ^ Vm- This implies finding 
a corresponding model vertex to each input vertex. Clearly, 
there are \Vm\ possible assignments for each input vertex 
and the decision of which to choose depends on an opti- 
mization procedure. 

Let Gd be an ARG initially equal to Gm, i.e., Vd = Vm, 
Ed = Em, /J^d = l^m, and i^d = ^m- Let also Vd and Vm be 
two corresponding vertices in Gd and Gm respectively. Sup- 
pose that an assignment from a vertex Vi G to a vertex 
Vm is under consideration. The quality of such an assign- 
ment may be assessed by computing the deformation which 
occurs in the model when this new vertex is merged with 
Vd, causing the attributes of fidi^d) and i^di^d) to be fused 
with those from Vi. After such merge, Gd becomes a dis- 
torted version of the model in which: 



connected by Vd is computed as the modulus and angu- 
lar differences between the relational attribute vectors of 
each pair (e, 6^). 

Therefore, the cost function measures how the merging 
of a vertex Vi with a copy Vd of a model vertex affects the 
local structure of the graph, as well as the appearance at- 
tributes it holds. The parameter a, < a < 1, controls the 
importance of the appearance or structural effects of all ver- 
tex mappings. 

5. The optimization algorithm 

To map Gi to Gm and estimate the adequacy of each ver- 
tex assignment using Gd, the following algorithm was de- 
vised: 



, ^ ,{R,,^R,J (G,^+G,J {B,^^B,J 
^d[Vd) [ ^ , ^ , ^ ) 



and 



Udie) = {Pwd -Pvd)/{^dmax) 



(1) 



(2) 



Ve e Ed{vd) = {e e Ed : e = (vd.Wd) or e = 
(wd,Vd),Wd e Vd} and with = ^^^in_t^^ 

The impact of such deformation is then measured ac- 
cording to the following cost function: 



f{Gd, Gm) = ^Cv{Vd, Vm) + 



\Ed{vd)\ 



CE{e,em) 



eeEdivd) 



(3) 

The term cy (^d, ^m) is a measure of the deformation be- 
tween the object attributes of Vd and Vm and it is defined as: 



Cv{Vd,Vm) 




^Vm)^ 



(4) 



Although the RGB color space was chosen to describe 
the color appearance feature of an object, other color spaces, 
such as the Lab, might as well be used with appropriate met- 
rics adaptation. 

Similarly, if e G Ed{v{d)) and Cm ^ Em is its corre- 
sponding edge, then C£;(e, e^) is a measure of the deforma- 
tion between both edges defined as: 



(5) 

The value 6 is the angle between ^{e) and z^(e^), 
whereas, the parameter 7^;, < < 1, controls 

the weights of the modulus and angular dissimilari- 
ties. Thus, the total impact caused on the edges directly 
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All possible assignments for each vertex Vi G Vi are 
evaluated and the final label of Vi corresponds to the model 
vertex which was affected the least by the deformation 
caused by the introduction of Vi. At each iteration, a ver- 
tex from Vi receives a label, fmin is the minimum defor- 
mation cost obtained so far and minlbl is the correspond- 
ing model vertex to which Vi was mapped resulting in such 
minimal deformation. The output of the algorithm is a map- 
ping of all vertices of Vi to vertices of Vm- 




Figure 4. Sample segmentation results: orig- 
inal images (left col.), images with user-input 
strokes defining regions of interest (center 
col.) and corresponding resulting partitions 
(right col.). 



6. Experimental results 

In order to test the presented technique, a Java appHca- 
tion was designed and implemented ^ . Its interface allows 
the user to load images to be segmented, define a model 
according to traces drawn over different regions of inter- 
est, and choose the a and 7 parameters of the cost function 
(eq. |3]), therefore specifying how to favor structure against 
appearance features. 

Figure |4] shows a few segmentation results for the appli- 
cation of the methodology to natural color images obtained 
from the Berkeley Image Segmentation Database ^. All re- 
sulting images depict the final regions labelled according to 
the color of the traces defined by the user. Transparency was 
used over the labelled segmented images in order to visual- 
ize more precisely what areas have been classified as a given 
object. Although certain image regions, such as the moun- 
tains and the face, present high variability due to textures or 
different objects grouped as one in the model, the final seg- 
mentation remains accurate and robust thanks to the struc- 



tural constraints embedded in the model. 

Tests for the reusability and robustness of the model were 
also performed on sample frames of a moving head video 
(fig. |5] top) from the XM2VTS Database ^ and on a set of 
similar images retrieved randomly from the web (fig.[5]bot- 
tom). Each model was defined once by the user in the first 
image of each set and then applied to the other similar im- 
ages. The latter step is interactively accomplished as fol- 
lows: once the user draws the traces over the first image, 
a minimum enclosing rectangle of the strokes is automati- 
cally defined. This rectangle, called a stamp, can later be ap- 
plied by the user to other images and segmentation can be 
performed within such area. 

Note that the model ARG is derived only once for the 
first image and then used in the segmentation process of all 
other input images. It is important to notice that simply ap- 
plying the same strokes to the other images would not pro- 
duce the same model as the one obtained for the first time, 
as fig. [6] depicts. This also shows that the model is robust 
enough to treat small variances presented by the input im- 
ages under analysis. For each image set, the overall struc- 
ture remained similar and the segmentation was once again 
satisfactory even though the model was derived from an im- 
age with different appearance features. 

The new algorithm proposed for the graph matching step 
presents faster performance than that of the algorithm re- 
ported in [4 1 . While the present algorithm runs in time pro- 
portional to 0(|l^i||Kn|). the other is bounded by a func- 
tion Besides this, the optimization algorithm 
does not depend on the order in which vertices from the in- 
put are labelled, since each vertex is treated separately when 
analysed during the graph matching step. 

7. Conclusion 

This paper proposed a novel algorithm for performing in- 
teractive model-based image segmentation using attributed 
relational graphs to represent both model and input images. 
This approach allows the usage of information ranging from 
appearance features to structural constraints. Topological 
differences between graphs are dealt with by means of a 
deformation ARG, a structure which allowed the design of 
an optimization algorithm for graph matching that evaluates 
possible solutions according to local impacts (or deforma- 
tions) they determine on the model. The faster performance 
of the algorithm in comparison with the one proposed in | 4l, 
the reusability of the model graph when segmenting sev- 
eral images, as well as the satisfying quality of the results 
due to the adequate use of structural information, character- 
ize the main contributions of the method. 



1 Please refer to the accompanying demo video. 

2 http : / /www .eecs.berkeley. edu/Research/Pro jects/ CS/ vis ion /grouping/ segbench/ 



3 http : //www. ee . surrey . ac . uk/CVSSP/xm2vtsdb/ 






Figure 5. Sample segmentation results after 
applying the same user-defined stroke model 
(top left image in each set) to different im- 
ages. 



Our ongoing work is devoted to reducing interaction 
when reusing the model to segment various images. For 
now, it is required that the user places the stamp over the 
area of interest of the image. In the future, we hope to be 
able to apply the model ARG without the need of this inter- 
active positional information. This shall be accomplished 
through the investigation of MAP-MRF methods applied 
within this framework in order to make more robust mod- 
els and improve segmentation quality under different con- 
ditions such as object translation and rotation. Furthermore, 
we intend to perform a quantitative study to compare the ac- 
curacy of our results with those of other related methods. 




Figure 6. Replication of the model: simply 
reusing the user-defined strokes over dif- 
ferent images does not guarantee that the 
model to be derived is always consistent, 
since the strokes could fall over distinct ob- 
jects from one image to another and the final 
segmentation would be compromised. 
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